17 research outputs found
Empirical Limitations on High Frequency Trading Profitability
Addressing the ongoing examination of high-frequency trading practices in
financial markets, we report the results of an extensive empirical study
estimating the maximum possible profitability of the most aggressive such
practices, and arrive at figures that are surprisingly modest. By "aggressive"
we mean any trading strategy exclusively employing market orders and relatively
short holding periods. Our findings highlight the tension between execution
costs and trading horizon confronted by high-frequency traders, and provide a
controlled and large-scale empirical perspective on the high-frequency debate
that has heretofore been absent. Our study employs a number of novel empirical
methods, including the simulation of an "omniscient" high-frequency trader who
can see the future and act accordingly
Do price trajectory data increase the efficiency of market impact estimation?
Market impact is an important problem faced by large institutional investor
and active market participant. In this paper, we rigorously investigate whether
price trajectory data from the metaorder increases the efficiency of
estimation, from an asymptotic view of statistical estimation. We show that,
for popular market impact models, estimation methods based on partial price
trajectory data, especially those containing early trade prices, can outperform
established estimation methods (e.g., VWAP-based) asymptotically. We discuss
theoretical and empirical implications of such phenomenon, and how they could
be readily incorporated into practice
Modeling Temporal Data as Continuous Functions with Process Diffusion
Temporal data like time series are often observed at irregular intervals
which is a challenging setting for existing machine learning methods. To tackle
this problem, we view such data as samples from some underlying continuous
function. We then define a diffusion-based generative model that adds noise
from a predefined stochastic process while preserving the continuity of the
resulting underlying function. A neural network is trained to reverse this
process which allows us to sample new realizations from the learned
distribution. We define suitable stochastic processes as noise sources and
introduce novel denoising and score-matching models on processes. Further, we
show how to apply this approach to the multivariate probabilistic forecasting
and imputation tasks. Through our extensive experiments, we demonstrate that
our method outperforms previous models on synthetic and real-world datasets
Short-term Temporal Dependency Detection under Heterogeneous Event Dynamic with Hawkes Processes
Many event sequence data exhibit mutually exciting or inhibiting patterns.
Reliable detection of such temporal dependency is crucial for scientific
investigation. The de facto model is the Multivariate Hawkes Process (MHP),
whose impact function naturally encodes a causal structure in Granger
causality. However, the vast majority of existing methods use direct or
nonlinear transform of standard MHP intensity with constant baseline,
inconsistent with real-world data. Under irregular and unknown heterogeneous
intensity, capturing temporal dependency is hard as one struggles to
distinguish the effect of mutual interaction from that of intensity
fluctuation. In this paper, we address the short-term temporal dependency
detection issue. We show the maximum likelihood estimation (MLE) for
cross-impact from MHP has an error that can not be eliminated but may be
reduced by order of magnitude, using heterogeneous intensity not of the target
HP but of the interacting HP. Then we proposed a robust and
computationally-efficient method modified from MLE that does not rely on the
prior estimation of the heterogeneous intensity and is thus applicable in a
data-limited regime (e.g., few-shot, no repeated observations). Extensive
experiments on various datasets show that our method outperforms existing ones
by notable margins, with highlighted novel applications in neuroscience.Comment: Conference on Uncertainty in Artificial Intelligence 202
Learning to Abstain From Uninformative Data
Learning and decision-making in domains with naturally high noise-to-signal
ratio, such as Finance or Healthcare, is often challenging, while the stakes
are very high. In this paper, we study the problem of learning and acting under
a general noisy generative process. In this problem, the data distribution has
a significant proportion of uninformative samples with high noise in the label,
while part of the data contains useful information represented by low label
noise. This dichotomy is present during both training and inference, which
requires the proper handling of uninformative data during both training and
testing. We propose a novel approach to learning under these conditions via a
loss inspired by the selective learning theory. By minimizing this loss, the
model is guaranteed to make a near-optimal decision by distinguishing
informative data from uninformative data and making predictions. We build upon
the strength of our theoretical guarantees by describing an iterative
algorithm, which jointly optimizes both a predictor and a selector, and
evaluates its empirical performance in a variety of settings
Provably Convergent Schr\"odinger Bridge with Applications to Probabilistic Time Series Imputation
The Schr\"odinger bridge problem (SBP) is gaining increasing attention in
generative modeling and showing promising potential even in comparison with the
score-based generative models (SGMs). SBP can be interpreted as an
entropy-regularized optimal transport problem, which conducts projections onto
every other marginal alternatingly. However, in practice, only approximated
projections are accessible and their convergence is not well understood. To
fill this gap, we present a first convergence analysis of the Schr\"odinger
bridge algorithm based on approximated projections. As for its practical
applications, we apply SBP to probabilistic time series imputation by
generating missing values conditioned on observed data. We show that optimizing
the transport cost improves the performance and the proposed algorithm achieves
the state-of-the-art result in healthcare and environmental data while
exhibiting the advantage of exploring both temporal and feature patterns in
probabilistic time series imputation.Comment: Accepted by ICML 202
Electronic Market Making: Initial Investigation
This paper establishes an analytical foundation for electronic market making. Creating an automated securities dealer is a challenging task with important theoretical and practical implications. Our main interest is a normative automation of the market maker's activities, as opposed to explanatory modeling of human traders, which was the primary concern of earlier work in this domain. We use a simple class of "non-predictive" trading strategies to highlight the fundamental issues. These strategies have a theoretical foundation behind them and serve as a showcase for the decisions to be addressed: depth of quote, quote positioning, timing of updates, inventory management, and others. We examine the impact of various parameters on the market maker's performance. Although we conclude that such elementary strategies do not solve the problem completely, we are able to identify the areas that need to be addressed with more advanced tools. We hope that this paper can serve as a first step in rigorous examination of the dealer's activities, and will be useful in disciplines outside of Finance, such as Agents, Robotics, and E-Commerce
Reinforcement learning for optimized trade execution
We present the first large-scale empirical application of reinforcement learning to the important problem of optimized trade execution in modern financial markets. Our experiments are based on 1.5 years of millisecond time-scale limit order data from NASDAQ, and demonstrate the promise of reinforcement learning methods to market microstructure problems. Our learning algorithm introduces and exploits a natural "low-impact " factorization of the state space. 1